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CIRCUIT AND METHOD OF CRYPTOGRAPHIC MULTIPLICATION 

Background of the Invention 

5 The present invention relates, in general, to 

public-key cryptography and, more particularly, to a 
public-key cryptographic integrated circuit. 

Rivest-Shamir-Adleman (RSA) and Elliptic Curve 
Cryptography (ECC) are public-key cryptographic 

10 algorithms that provide high security for digital data 
transfers between electronic devices. The modular 
mathematics of the RSA and ECC (Fp) algorithms can be 
computed on a hardware multiplier and the polynomial 
mathematics of the ECC (F2 M in polynomial-basis) 

15 algorithm can be computed on a different hardware 

multiplier. Both hardware multiplier architectures that 
are used for computing the RSA and ECC algorithms can use 
pipelining techniques for the massive parallel 
computations of the algorithms. The pipelined multiplier 

20 offers lower power which is required for many 
applications . 

Hardware implementations for computing RSA and ECC 
algorithms is not straight forward. Thus, the type of 
cryptography best suited for the system application 

25 defines the appropriate hardware multiplier architecture 
that computes the desired RSA or ECC algorithms. With 
increasing demand for faster cryptographic operations and 
higher performance, hardware modular multiplier 
architecture improvements are needed to ensure high 

30 levels of security. 

Accordingly, it would be advantageous to provide 
cryptography in a multiplication system that achieves 
high performance, low cost, and low-power for 
implementation in an integrated circuit. It would be a 

35 further advantage for the multiplication system to 
compute the RSA and ECC algorithms. 
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Brief Description of the Drawings 

FIG. 1 is a block diagram illustrating one 
5 embodiment of an integrated cryptographic system having 
an RSA arithmetic processor and a separate ECC arithmetic 
processor; 

FIG. 2 is a block diagram illustrating another 
embodiment of an integrated cryptographic system having a 
10 single processor for computing algorithms for both RSA 
and ECC data cryptography; 

FIG. 3 is a schematic diagram showing one embodiment 
of a portion of the single processor of FIG. 2; 

FIG. 4 is a schematic diagram showing another 
15 embodiment of a portion of the single processor of FIG. 
2; 

FIG. 5 is a schematic diagram showing a portion of a 

M 

multiplier for computing the ECC algorithm (F2 in the 

polynomial basis) ; 
20 FIG. 6 is a block diagram that illustrates a 1 x N 

multiplier for computing either the RSA or the ECC 
algorithm; 

FIG. 7 is a schematic diagram of a cell used in the 
C-register of the multiplier of FIG. 6 for single-cycle 
25 multiplication operations; and 

FIG. 8 is a schematic diagram of another cell used 
in the C-register of the multiplier of FIG. 6 for two- 
cycle multiplication operations. 

30 

Detailed Description of the Preferred Embodiment 

Generally, the present invention of an integrated 
cryptographic circuit provides cryptographic functions 
35 that support Rive st- Shamir- Adleman (RSA) and Elliptic 

Curve Cryptography (ECC) algorithms. The cryptographic 
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integrated circuit has applications in internet commerce, 
paging, cellular phones, smartcards, and smartcard 
terminals, among others. Data, such as personal health 
records, financial records, finger prints, and retina eye 
5 prints is encrypted using functions that include integer 
modular multiplications, modular polynomial 
multiplication, addition, subtraction, and 
exponentiation. The integrated cryptographic circuit 
provides a hardware architecture that efficiently 

10 computes both the RSA and the ECC algorithms. 

FIG. 1 is a block diagram illustrating an embodiment 
of an integrated cryptographic system 10 having an RSA 
arithmetic processor 18 and a separate ECC arithmetic 
processor 20. The single chip cryptographic system 10 is 

15 configured to operate in a data communication network and 
perform cryptographic functions using either the RSA or 
ECC algorithms. Cryptographic system 10 includes a host 
interface block 12 having an input connected to an 
INTERFACE BUS. Data signals are transmitted and received 

20 via the INTERFACE BUS to/from other electronic devices 
(not shown) outside cryptographic system 10. By way of 
example, a microprocessor, a Random Access Memory (RAM), 
a Read Only Memory (ROM) , a Memory Access Controller 
(MAC), a Secure Memory Management Unit (SMMU) , and a 

25 Universal Asynchronous Receive/Transmit (UART) block are 
electronic devices external to cryptographic system 10 
that provide and control data at the terminals of Host 
Interface block 12. The blocks external to cryptographic 
system 10 are not shown in the figures. 

30 Cryptographic system 10 further includes a temporary 

storage memory 14 having an input connected to Host 
Interface block 12. Temporary storage memory 14 receives 
data values that allow cryptographic system 10 to perform 
public-key cryptographic functions. Thus, memory 14 

35 stores the data values that support the RSA modular 

exponentiation performed by RSA arithmetic processor 18 
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and, in addition, the data values that support the 
elliptic curve point multiplication performed by ECC 
arithmetic processor 20. 

Specifically, for the RSA modular exponentiation, 
5 memory 14 stores data values such as a modulus value N, 
operand values A and B, exponent values, and partial 
product values. In addition, for the ECC elliptic curve 
point multiplication, memory 14 stores data values such 
as an irreducible polynomial, a value for odd prime 

10 fields, an ECC system-wide parameter for the generator 
point, elliptic curve coefficients, a point scalar, and 
temporary values. 

Typically, the storage capacity of memory 14 roughly 
supports a four to one key size ratio of RSA to ECC. For 

15 example, if the memory supported an RSA key size of 1024 
bits, then the same memory could approximately support an 
ECC key size of up to 256 bits. Thus, memory 14 provides 
for a lower level of security when using the RSA 
algorithm compared to using the ECC algorithm- By using 

20 memory 14 to store data values for both RSA arithmetic 

processor 18 and ECC arithmetic processor 20, the silicon 
area and total cost of cryptographic system 10 is 
reduced. 

Similar types of software instructions can be used 
25 for computing both the ECC and RSA algorithms. By way of 
example, the RSA algorithm uses the binary square-and- 
multiply routine in computing exponential functions while 
the ECC algorithm uses the double-and-add routine in the 
computation of point multiplies. Thus, similar software 
30 routines are used to support mathematical operations 

using either the RSA or ECC algorithm. Similarities can 
also be found between multiplies of the respective 
algorithms, e.g., integer modulo-N for RSA and modular 
multiplies in the polynomial-basis for ECC. 
35 In operation, the data values stored in memory 14 

are transferred to RSA arithmetic processor 18 or to ECC 
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arithmetic processor 20. A control circuit 16 provides 
control signals that manage the transfer of data values 
between memory 14, RSA arithmetic processor 18, ECC 
arithmetic processor 20, and Host Interface block 12. In 
5 addition, the control signals generated in control block 
16 control the mathematical computations that are 
provided by RSA arithmetic processor 18 and ECC 
arithmetic processor 20 in the processing of data. Put 
another way, a control signal from control block 16 

10 enables RSA arithmetic processor 18 for computing the RSA 
algorithm or ECC arithmetic processor 20 for computing 
the ECC algorithm. The similarities that exist between 
the RSA and ECC algorithms reduce the number of control 
signals generated by control circuit 16. 

15 FIG. 2 is a block diagram illustrating another 

embodiment of an . integrated cryptographic system 24 
having a single 'arithmetic processor 22 for computing RSA 
and ECC algorithms. It should be noted that the same 
reference numbers are used in the figures to denote the 

20 same elements. This embodiment of cryptographic system 
24 connects other electronic devices (not shown) to host 
interface block 12 through an INTERFACE BUS. Data 
signals are transferred through Host Interface block 12 
to temporary storage memory 14 for storing data. Control 

25 circuit 16 provides control signals to arithmetic 

processor 22 that manage the transfer of data values from 
temporary storage memory 14 and control the functions 
provided by arithmetic processor 22. One such control 
signal generated by control circuit 16 is the INT/POLY 

30 signal that selects or enables arithmetic processor 22 to 
generate the mathematical operations of the RSA algorithm 
and the ECC algorithm. Thus, arithmetic processor 22 
provides cryptographic functions based either on RSA 
modular exponentiation or ECC elliptic curve point 

35 multiplication . 
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FIG. 3 is a schematic diagram showing one embodiment 
of a portion of the single arithmetic processor 22 of 
FIG . 2. Arithmetic processor 22 performs a 
multiplication of operands A and B and supplies a product 

5 value, i.e., Pi+o> Pi+l* Pi+2/ and Pi+3/ to a modulo 

reducer 60. Operands A and B can be numerical data or 
plain text strings that are converted to ordinal numbers 
using American Standard Code for Information Interchange 
(ASCII) or other transformed character sets. 

10 Modulo reducer 60 of arithmetic processor 22 

includes an adder array having X columns and Y rows, 
where X and Y are integer numbers. The preferred 
embodiment of the adder array has sixteen columns and 
sixteen rows. However, it should be noted that the 

15 present invention is not limited to an adder array having 
sixteen columns and sixteen rows or to an array having 
matching numbers of rows and columns. Modulo reducer 60 
is described in simplified form for simplicity and 
illustrative purposes as being a four by four array of 

20 adders along with associated logic. 

Adders 90, 92, 94, and 96 are in column Xo, adders 
100, 102, 104, and 106 are in column Xi, adders 110, 112, 
114, and 116 are in column X2, and adders 120, 122, 124, 
and 12 6 are in column X3 of the adder array of modulo 

25 reducer 60. Adders 90-96, 100-106, 110-116, and 120-126 
each have first and second data inputs, a carry input 
(CI) , a carry output (CO) , and a sum output (S) . 

The first inputs of adders 90, 92, 94, and 96 in 

column Xo are connected to respective input terminals 80, 

30 82, 84, and 86. Two input AND-gates 89, 91, 93, and 95 
each have a first input commonly connected to each other 
and to a Q output of a latch 128. The outputs of AND- 
gates 89, 91, 93, and 95 are connected to the second 
inputs of adders 90, 92, 94, and 96, respectively. In 
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addition, a carry output (CO) of adder 90 is coupled 
through an AND-gate 90A to a carry input (CI) of adder 
92, a carry output of adder 92 is coupled through an AND- 
gate 92A to a carry input of adder 94, and a carry output 
5 of adder 94 is coupled through an AND-gate 94A to a carry 
input of adder 96. The carry output of adder 96 is 
coupled through an AND-gate 9 6A to a data input of a 
latch 152. The output of latch 152 is connected to the 
carry input of adder 90. 

10 Logic gates such as, for example, AND-gates 90A, 

92A, 94A, and 96A are also referred to as blocking 
circuits. When the select or enable signal common to all 
of the blocking circuits has a logic one value, then the 
carryin signal is transferred through the blocking 

15 circuit. On the other hand, when the select or enable 

signal has a logic zero value, then the carryin signal is 
blocked or inhibited from propagating through the 
blocking circuit. 

The first inputs of adders 100, 102, 104, and 106 in 

20 column Xi are connected to the respective outputs of 

adders 90, 92, 94, and 96 in column Xq. Two input AND- 
gates 99, 101, 103, and 105 have a first input commonly 
connected to each other and to a Q output of a latch 132. 
The outputs of AND-gates 99, 101, 103, and 105 are 

25 connected to the second inputs of adders 100, 102, 104, 
and 106, respectively. In addition, a carry output of 
adder 100 is coupled through an AND-gate 100A to a carry 
input of adder 102, a carry output of adder 102 is 
coupled through an AND-gate 102A to a carry input of 

30 adder 104, and a carry output of adder 104 is coupled 

through an AND-gate 104A to a carry input of adder 106. 
The carry output of adder 106 is coupled through an AND- 
gate 10 6A to a data input of a latch 156. The output of 
latch 156 is connected to the carry input of adder 100- 
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The first inputs of adders 110, 112, 114, and 116 in 
column X2 are connected to the respective outputs of 
adders 100, 102, 104, and 106 in column Xi . Two input 

AND-gates 109, 111, 113, and 115 have a first input 
5 commonly connected to each other and to a Q output of a 
latch 136 . The outputs of AND-gates 109, 111, 113, and 
115 are connected to the second inputs of adders 110, 
112, 114, and 116, respectively. In addition, a carry 
output of adder 110 is coupled through an AND-gate 110A 

10 to a carry input of adder 112, a carry output of adder 

112 is coupled through an AND-gate 112A to a carry input 
of adder 114, and a carry output of adder 114 is coupled 
through an AND-gate 114A to a carry input of adder 116. 
The carry output of adder 116 is coupled through an AND- 

15 gate 116A to a data input of a latch 160. The output of 
latch 160 is connected to the carry input of adder 110. 

The first inputs of adders 120, 122, 124, and 126 in 

column X3 are connected to the respective outputs of 

adders 110, 112, 114, and 116 in column X2 . Two input 

20 AND-gates 119, 121, 123, and 125 have a first input 

commonly connected to each other and to a Q output of a 
latch 140. The outputs of AND-gates 119, 121, 123, and 
125 are connected to the second inputs of adders 120, 
122, 124, and 126, respectively. In addition, a carry 

25 output of adder 120 is coupled through an AND-gate 120A 
to a carry input of adder 122, a carry output of adder 
122 is coupled through an AND-gate 122A to a carry input 
of adder 124, and a carry output of adder 12 4 is coupled 
through an AND-gate 124A to a carry input of adder 126. 

30 The carry output of adder 12 6 is coupled through an AND- 
gate 126A to a data input of a latch 162. The output of 
latch 162 is connected to the carry input of adder 120. 
The output S of adders 120, 122, 124, and 126 are 
connected to respective output terminals 164, 166, 168, 

35 and 170. AND-gates 90A-96A, 100A-106A, 110A-116A, and 
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120A-126A are enabled when arithmetic processor 22 is 
computing integer-modulo-N multiplications and not 
enabled when the arithmetic processor is computing 
modular polynomial-basis multiplications. In other 
5 words, the carryout signal of respective adders 90-96, 

100-106, 110-116, and 120-126 is not propagated when the 
modular polynomial-basis multiplications algorithm is 
being computed. The letter "A" has been appended to the 
reference number of the AND-gates to signify that each 

10 adder, such as adder 90, has a corresponding AND-gate, 

i.e., 90A, that either passes or blocks the carry output 
of that adder from being transferred to the carry input 
of an adjacent adder. 

Further, the second inputs of AND-gates 8 9, 101, 

15 113, and 125 are commonly connected to each other and to 
input terminal 81. The second inputs of AND-gates 91, 
103, and 115 are commonly connected to each other, to an 
input of a latch 158, and to input terminal 83. The 
second inputs of AND-gates 93 and 105 are commonly 

20 connected to each other, to an input of a latch 154, and 
to input terminal 85. The second input of AND-gate 95 is 
commonly connected to an input of a latch and to input 
terminal 87. The second inputs of AND-gates 99, 111, and 
123 are commonly connected to each other and to an output 

25 of latch 150. The second inputs of AND-gates 109 and 121 
are commonly connected to each other and to an output of 
latch 154. The second input of AND-gate 119 is connected 
to an output of latch 158. 

Latches 128, 132, 136, and 14 0 each have a set input 

30 (S), a reset input (R) , and an output (Q) . Latches 128, 
132, 136, and 140 are enabled when signal T is high 
causing the signal at output Q to have the same value as 
the signal at input S. The signals at the Q outputs are 
latched when the signal T transitions from a high to a 

35 low logic value. The signal at input R resets the 

signals at the Q outputs. The reset inputs R of latches 
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128, 132, 136, and 140 are commonly connected to each 
other and to a terminal 79. Terminal 79 is coupled for 
receiving a reset signal R. A two input AND-gate 130 has 
an output connected to the set input of latch 128. The 
5 first input of AND-gate 130 is connected to the first 
input of adder 90. A two input AND-gate 134 has an 
output connected to the set input of latch 132. The 
first input of AND-gate 134 is connected to the first 
input of adder 102. A two input AND-gate 138 has an 

10 output connected to the set input of latch 136. The 
first input of AND-gate 138 is connected to the first 
input of adder 114. A two input AND-gate 142 has an 
output connected to the set input of latch 140. The 
first input of AND-gate 142 is connected to the first 

15 input of adder 12 6. The second inputs of AND-gates 130, 
134, 138, and 142 are commonly connected to each other 
and to terminal 78. Terminal 78 is coupled for receiving 
a signal T. 

Large operands such as, for example, two 1024 bit 
20 operands are multiplied using pipelining techniques and 
multiple passes or rotations through a multiplier (not 
shown) . Typically, the larger operands A and B are 
segmented into smaller groups that are referred to as 

digits, e.g., digits Ao~An and Bo~Bn. The pipelined 
25 multiplier has an array size that is appropriate for 

multiplying the digits. By way of example, the digits 

Aq-An and Bq-Bn are 16 bit binary numbers and the 
multiplier is a 16 bit multiplier, although this is not a 
limitation of the present invention. 
30 In general, integer-modulo-N Montgomery 

multiplications take the form of: 

(A*R mod N) (B*R mod N) + il*N 

35 where: 
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A is the first operand and an integer; 
B is the second operand and an integer; 
N is an integer having an odd value; 

mod N is a remainder value of (A*B*R) /N that defines 
5 the number of elements in the finite field; 

R is an integer power of two number having a value 
greater than the value of N; and 

^1 is a reduction value that is computed such that 
(A*R mod N) (B*R mod N) + id*N is an integer that can be 
10 divided by R without a loss of significant bits. 

In operation, modulo reducer 60 receives the product 
of (A*R mod N) and (B*R mod N) and generates reduced 
partial product outputs for integer-modulo-N 
15 multiplications. For simplicity and illustrative 

purposes integer-modulo-N multiplications are described 
using the following example for four-bit numbers. 
Referring to FIG. 3, input terminals 80, 82, 84, and 86 

receive the respective product terms P±+0/ Pi+l/ Pi+2/ and 

20 Pi+3 that result from multiplying operands such as, for 

example, operands Ao and Bo- In addition, input 

terminals 81, 83, 85, and 87 receive the values Ni+o, 

Ni+i/ Ni+2, and Ni+3, i.e., values for the integer N. 
Modulo reducer 60 generates a reduced product term for 

25 modular multiplication at output terminals 164-170* 

Modulo reducer 60 implements the Foster-Montgomery 
Reduction Algorithm. In the Foster-Montgomery Reduction 
Algorithm the logic values at particular bit locations 
determine whether the value of N is aligned and added to 

30 a summed value. The architecture of modulo reducer 60 

allows the value of N to both be aligned and added to the 
summed value when the logic value at a particular bit 
location has a logic one value. By aligning and adding 
the value of N, the value of is determined and stored 
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in latches 128, 132, 136, and 140. In other words, the 
value of ^1 is determined during the reduction process 
that generates the reduced product term at output 
terminals 164-170 and not prior to the multiplication of 
5 digits Ao and Bq. 

An example is described where the term (A*R mod N) 
has the value of 0001 when using base two numbers and Aio 
= 9f RlO = 16, and Nio = 13. Further, the term (B*R mod 
N) has the value of 0111 when Bio = 11/ Rio = 16, and Nio 

10 =13. Note that operands Ao and Bo are pre-multiplied by 
R for Montgomery multiplication to simplify a hardware 
modular reduction problem. When the operands (A*R mod N) 
and (B*R mod N) are multiplied, the product terms, Pi+3, 
Pi+l, and P±+o have the respective value of 0111. 

15 Initially, a reset signal at terminal 79 causes the 

Q outputs of latches 128, 132, 136, and 140 to have logic 
zero values. AND-gate 130 receives the product term Pi+o, 
having a logic one value, at one input and the signal T, 
having a logic one value, at the other input. The output 

20 of AND-gate 130 generates a logic one value that causes 
latch 128 to set, i.e., the signal at the Q output has a 
logic one value. It should be noted that the signal T 
has a logic one value during the time that operands Aq 
and Bo, i.e., the lower order digits of operands A and B, 

25 are multiplied together. It should be further noted that 
the logic one value at the Q output of latch 128 causes 
AND-gates 89, 91, 93, and 95 to be enabled and pass the 
values Ni+O/ N i+ i, Ni+2/ and Ni+3 to the second inputs of 
adders 90, 92, 94, and 96, respectively. Thus, the 

30 adders located in column Xq generate output signals that 
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are the sum of the values Ni+0r Ni+i, Ni+2/ and Ni+3 and 

the corresponding values of Pi+0/ p i+l/ Pi+2/ and Pi+3- 

The logic one values at the first and second inputs 
of adder 90 cause output S to supply a logic zero value. 
5 Further, adder 90 generates a carry signal at output CO. 
Adder 92 receives a logic one value at the first input, a 
logic zero value at the second input, and a logic one 
value for the carry signal at input CI. The signal at 
output S of adder 92 has a logic zero value and the carry 

10 signal at output CO has a logic one value. 

Adder 94 receives a logic one at the first input, a 
logic one at the second input from AND-gate 93, and a 
carry signal enabled through AND-gate 92A. The output S 
of adder 94 has a logic one value and the carryout signal 

15 has a logic one value at the carry output CO. Likewise, 
adder 96 receives a logic zero at the first input, a 
logic one at the second input from AND-gate 95, and a 
carry signal enabled through AND-gate 94A. The output 
signal at output S of adder 96 has a logic zero value and 

20 the carry signal at the carry output CO has a logic one 
value. In accordance with the Foster-Montgomery 
Reduction Algorithm, the particular bit location having a 
logic one value, i.e., the least significant bit location 
at input terminal 80, causes the value N to be aligned 

25 and added to the value P. 

Again, according to the Foster-Montgomery Reduction 

Algorithm, the data generated by the adders in column Xi 

have values that depend on the data at a particular data 
bit location. The particular data bit location in this 

30 instance corresponds with the output S of adder 92. It 
should be noted that an input of AND-gate 134 receives a 
logic zero value from the signal at output S of adder 92. 
Latch 132 is not set and the Q output of latch 132 
remains a logic zero value. AND-gates 99, 101, 103, and 

35 105 generate a logic zero value at the second inputs of 
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adders 100, 102, 104, and 106, respectively. Adder 100 
has logic zero values at both the first and second inputs 
and generates a logic zero value at output S. Likewise, 
adder 102 has logic zero values at both the first and 
5 second inputs and generates a logic zero value at output 
S. Adder 104 has a logic one value at the first input 
and a logic zero value at the second input and generates 
a logic one value at output S. Adder 106 has logic zero 
values at both the first and second inputs and generates 
10 a logic zero value at output S. Thus, adders 106, 104, 

102, and 100 in column Xi generate a respective value of 

0100. 

The data generated by the adders in column X2 have 

values that also depend on the data at a particular data 

15 bit location. The particular data bit in this instance 

is the logic value at the output of adder 104. It should 
be noted that an input of AND-gate 13 8 receives a logic 
one value from the signal at output S of adder 104. The 
logic one value at the output of AND-gate 138 causes 

20 latch 136 to set and the Q output of latch 136 to have a 
logic one value. AND-gates 109, 111, 113, and 115 are 
enabled by the logic one value generated by latch 136. 
Thus, the data at the outputs of adders 100, 102, 104, 
and 106 is transferred to the second inputs of adders 

25 110, 112, 114, and 116, respectively. Adder 110 has 

logic zero values at both the first and second inputs and 
generates a logic zero value at output S. Likewise, 
adder 112 has logic zero values at both the first and 
second inputs and generates a logic zero value at output 

30 S. Adder 114 has logic one values at both the first and 
second inputs and generates a logic zero value at output 
S and a logic one value for the carryout signal at output 
CO. Adder 116 has logic zero values at both the first 
and second inputs and a logic one value is transferred 

35 through AND-gate 114A to the carry input of adder 116. A 
logic one value is generated at output S of adder 116. 
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Thus, adders 116, 114, 112, and 110 in column X2 generate 

a respective value of 1000. 

The data generated by adders 120, 122, 124, and 126 

in column X3 have values that also depend on the data at 
5 a particular data bit location. The particular data bit 
in this instance is the logic value at the output of 
adder 116. An input of AND-gate 142 receives a logic one 
value from the signal at output S of adder 116. And gate 
142 having a logic one value from adder 116 and a logic 

10 one value for the signal T causes latch 140 to set. The 
Q output of latch 140 has a logic one value which enables 
AND-gates 119, 121, 123, and 125. The data at the 
outputs of adders 110, 112, 114, and 116 is transferred 
to the first inputs of adders 120, 122, 124, and 126, 

15 respectively. Adder 120 has logic zero values at both 
the first and second inputs and generates a logic zero 
value at output S. Likewise, adder 122 has logic zero 
values at both the first and second inputs and generates 
a logic zero value at output S. Adder 124 also has logic 

20 zero values at both the first and second inputs and 

generates a logic zero value at output S. Adder 12 6 has 
logic one values at both the first and second inputs and 
generates a logic zero value at output S and a logic one 
value as the carryout signal at the carry output- Thus, 

25 adders 126, 124, 122, and 120 in column X3 generate a 

respective value of 0000 at output terminals 164-170. 

During the reduction process that causes the first 
partial product of Aq and Bo to have a value of zero, the 
appropriate latches 128, 132, 136, and 140 have been set 

30 and contain the value 1101 for that is used in 

subsequent pipelined multiplications. Following the 
reduction of the first partial product to zero, the 
signal T transitions from a logic one to a logic zero 
value and stores the value of in latches 128, 132, 136, 

35 and 140. The stored value of iJ, the next digit of N, and 
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the products of the digits Bi-Bg3 with A1-A63 are used by 

modulo reducer 60 to complete the polynomial 
multiplication. 

FIG. 4 is a schematic diagram showing a multiplier 
5 structure 171 as a portion of another embodiment of 

single arithmetic processor 22 of FIG. 2. Multiplier 
structure 171 performs mathematical operations in support 
of integer-modulo-N multiplications and modular 
polynomial-basis multiplications. Multiplier structure 

10 171 is described in simplified form for simplicity and 
illustrative purposes as being a four by four array of 
adders. Although multiplier structure 171 is described 
as an array of adders having the same number of rows and 
columns, this is not a limitation of the present 

15 invention. 

Multiplier structure 171 has adders 90, 92, 94, and 
96 in column Xo, adders 100, 102, 104, and 106 in column 
Xi, adders 110, 112, 114, and 116 in column X2, and 

adders 120, 122, 124, and 126 in column X3 . In addition, 

20 latches 152, 156, 160, and 162 store carryout signals 
that are used in computing integer-modulo-N 
multiplications for generating the next partial product. 
Latches 150, 154, and 158 store data bits of operand B, 
and latches 226, 228, and 230 store data bits of the 

25 value N for use in generating the next partial product. 

The multiplexers (muxes) in multiplier structure 171 
each have four inputs, an output, and two selector 
inputs. Multiplexers 172-178, 182-188, 192-198, and 202- 
208 are illustrated as having outputs connected to the 

30 first input of the adders, although, it should be noted 
that the outputs of the multiplexers could be connected 
to the second inputs of the adders. The signals on the 
first and second selector inputs of the muxes select a 
signal at one of the four mux inputs for transfer to the 

35 mux output. The output signals from muxes 172-178 are 
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transferred to the first input of adders 90-96, 
respectively. The output signals from muxes 182-188 are 
transferred to the first input of respective adders 100- 
106. The output signals from muxes 192-198 are 
5 transferred to the first input of adders 110-116, 

respectively. The output signals from muxes 202-208 are 
transferred to the first input of adders 120-126, 
respectively . 

Further, the first selector inputs of muxes 172-178 
10 are commonly connected to each other and receive the 

signal A(bit 0) • The second selector inputs of muxes 172- 
178 are commonly connected to each other and to an output 
of a latch 212. The first selector inputs of muxes 182- 
188 are commonly connected to each other and receive the 

15 signal A(bit 1) • The second selector inputs of muxes 182- 
188 are commonly connected to each other and to an output 
of a latch 216. The first selector inputs of muxes 192- 
198 are commonly connected to each other and receive the 
signal A (bit 0) - The second selector inputs of muxes 192- 

20 198 are commonly connected to each other and to an output 
of a latch 220. The first selector inputs of muxes 202- 
208 are commonly connected to each other and receive the 

signal A(bit 3) • The second selector inputs of muxes 202- 
208 are commonly connected to each other and to an output 
25 of a latch 224. 

A first input of muxes 172-178, 182-188, 192-198, 
and 202-208 is commonly coupled for receiving a logic 
zero value. The second input of muxes 172, 174, 176, and 

178 receive the respective values B(bit 0)/ B (bit l) / B(bit 
30 2) f and B (bit 3)- The third inputs of muxes 172, 174, 
176, and 178 receive the respective values of N (bit 0) * 

N(bit l)/ N(bit 2)/ and N(bit 3)* The fourth inputs of 
muxes 172, 174, 176, and 178 receive the summed value of 
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the respective values for N and B. Thus, the fourth 
input of each mux receives the logical summed value of 
the values supplied at the second and third inputs of 
that mux . 

When the first and second selector inputs of the 
muxes receive respective logic values of 00, the signals 
at the first inputs of muxes 172-178, 182-188, 192-198, 
and 202-208 are transferred to the outputs of the 
corresponding muxes. When the first and second selector 
inputs receive respective logic values of 01, the signals 
at the second inputs of muxes 172-178, 182-188, 192-198, 
and 202-208 are transferred to the outputs of the 
corresponding muxes. When the first and second selector 
inputs receive respective logic values of 10, the signals 
at the third inputs of muxes 172-178, 182-188, 192-198, 
and 202-208 are transferred to the outputs of the 
corresponding muxes. When the first and second selector 
inputs receive respective logic values of 11, the signals 
at the fourth inputs of muxes 172-178, 182-188, 192-198, 
and 202-208 are transferred to the outputs of the 
corresponding muxes. 

Latches 212, 216, 220, and 224 latch a data signal 
from respective logic circuits 210, 214, 218, and 222 
when the signal T transitions from a logic one to a logic 
zero value. The data signal generated by logic circuit 
210 is the product of the signals A (BIT 0) and B (BIT 0) 
exclusive or'ed with P(0), where P(0) is the least 
significant bit of the previous partial product value. 
The data signal generated by logic circuit 214 is the 
product of the signals A (BIT 1) and B (BIT 0) exclusive 
or'ed with the output signal from adder 92. The data 
signal generated by logic circuit 218 is the product of 
the signals A (BIT 2) and B (BIT 0) exclusive or'ed with 
the output signal from adder 104. The data signal 
generated by logic circuit 222 is the product of the 
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signals A (BIT 3) and B (BIT 0) exclusive or'ed with the 
output signal from adder 116. 

AND-gates 90A-96A are located in the carry chain 

path of the adders in column Xq. Thus, AND-gates 90A-96A 
5 either enable or disable signals from propagating in the 
carry chain of column Xo- Likewise, AND-gates 100A-106A 
are located in the carry chain path of the adders in 
column Xi and either enable or disable signals from 
propagating in the carry chain of column Xi . AND-gates 

10 110A-116A are located in the carry chain path of the 

adders in column X2 and either enable or disable signals 
from propagating in the carry chain of column X2 . AND- 
gates 120A-126A are located in the carry chain path of 
the adders in column X3 and either enable or disable 

15 signals from propagating in the carry chain of column X3 . 
Each AND-gate 90A-96A, 100A-106A, 110A-116A, and 120A- 
126A is enabled when multiplier structure 171 is 
computing integer-modulo-N multiplications and disabled 
when multiplier structure 171 is computing modular 

20 polynomial-basis multiplications. In other words, the 
carry chain paths of multiplier structure 171 only 
propagate carry chain signals to adjacent adder cells 
when integer-modulo-N multiplications are being computed. 
The multiplication process that generates the 

25 partial product of digits Ao and Bo causes the logic 

values at output terminals 164-170 to be reduced. Thus, 
the partial product that results from digit Ao times 
digit Bo has all logic zero values. In addition, latches 
128, 132, 136, and 140 have been appropriately set and 

30 store the value for during the multiplication of Ao and 
Bo. During subsequent multiply operations, the stored 
value of bl, along with corresponding values of Ni-N63/ 
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digits Bi-B63/ and digits A3.-A63 are used by multiplier 
structure 171 to complete the mathematical computations 
for integer-modulo-N multiplications . 

Referring to FIG. 4, the following example uses the 
5 arithmetic process for modular polynomial multiplication. 
The Montgomery Reduction Algorithm for polynomial 
multiplication takes the form of: 

<A*R mod N) (B*R mod N) + ^l*N 

10 

where : 

A is the first operand and a polynomial; 

B is the second operand and a polynomial; 

N is an irreducible polynomial; 
15 mod N is a remainder value of (A*B*R) /N that defines 

the number of elements in the finite field; 

R is an integer power of two number having a value 
greater than the value of N; and 

^1 is a reduction value that is computed such that 
20 (A*R mod N) <B*R mod N) + ^l*N is an integer that can be 
divided by R without a loss of significant bits. 

An example is described where the term (A*R mod N) 

6 4 

has the value of (x + x ) mod N = x + 1 or 011 when 

2 

25 using base two numbers and A = 5 (base ten) or (x + 1) 

4 

in polynomial form, R = 16 (base ten) or (x ) in 

3 

polynomial form, and N = 11 (base ten) or (x + x + 1) in 

polynomial form. Further, the term (B*R mod N) has the 

6 2 

value of 101 or (x ) mod N = x + 1 in polynomial form 

30 when B = 4 (base ten), R = 16 (base ten), and N = 11 

(base ten) . Note that digits Ao and Bo are pre- 

multiplied by R to simplify a hardware modular reduction 
problem. When the operands (A*R mod N) and (B*R mod N) 
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are multiplied, the product terms, Pi+3* Pi+2* Pi+l/ and 
Pi+0 have the respective value of 1111 . Multiplier 
structure 171 reduces the product of [ (A*R mod N) * (B*R 

mod N) ] mod N by R, which results in a value of 0111 or 
2 

5 (x + x + 1) in polynomial form. 

Initially, a reset signal at terminal 79 causes the 
Q outputs of latches 128, 132, 136, and 140 to have logic 
zero values. AND-gate 130 receives the product term Pi+o, 
having a logic one value, at one input and the signal T, 

10 having a logic one value, at the other input. The output 
of AND-gate 130 generates a logic one value that causes 
latch 128 to set, i.e., the signal at the Q output has a 
logic one value. It should be noted that the signal T 
has a logic one value during the time that digits Ao and 

15 Bo, i.e., the lower order segment of operands A and B, 

are multiplied together. It should be further noted that 
the logic one value at the Q output of latch 128 causes 
AND-gates 89, 91, 93, and 95 to be enabled and pass the 

values Ni+0/ Ni+i, Ni+2/ and Nj_+3 to the second inputs of 
20 adders 90, 92, 94, and 96, respectively. Thus, the 

adders located in column Xo generate output signals that 

are the sum of the values Ni+o, Ni+i, Ni+2, and Ni+3 and 

the corresponding values of P±+0/ Pi+1/ Pi+2/ and Pi+3- 

The logic one values at the first and second inputs 

25 of adder 90 cause output S to supply a logic zero value. 
Further, adder 90 generates a carry signal at output CO. 
Adder 92 receives a logic one value at the first input, a 
logic one value at the second input, and a logic zero 
value for the carry signal at input CI (AND-gate 90A 

30 blocks the carry signal generated by adder 90 from 
propagating to adder 92) . The signal at output S of 
adder 92 has a logic zero value and the carry signal at 
output CO has a logic one value. It should be noted that 
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AND-gate 92A blocks the carry signal generated by adder 
92 from propagating to adder 94, 

Adder 94 receives a logic one at the first input, a 
logic zero at the second input from AND-gate 93, and a 
5 logic zero for the carry signal. The output S of adder 
94 has a logic one value and the carryout signal has a 
logic zero value at the carry output CO. Likewise, adder 
96 receives a logic one at the first input, a logic one 
at the second input from AND-gate 95, and a logic zero 

10 value for the carry signal. The output signal at output 
S of adder 96 has a logic zero value and the carry signal 
at the carry output CO has a logic one value. In 
accordance with the Foster-Montgomery Reduction 
Algorithm, the particular bit location having a logic one 

15 value, i.e., the least significant bit location at input 
terminal 8 0, causes the value N to be aligned and added 
to the value P. 

According to the Foster-Montgomery Reduction 

Algorithm, the data generated by the adders in column Xi 

20 have values that depend on the data at a particular data 
bit location. The particular data bit location in this 
instance corresponds with the output S of adder 92. It 
should be noted that an input of AND-gate 134 receives a 
logic zero value from the signal at output S of adder 92. 

25 Latch 132 is not set and the Q output of latch 132 

remains a logic zero value. AND-gates 99, 101, 103, and 
105 generate logic zero values at the second inputs of 
adders 100, 102, 104, and 106, respectively. Adder 100 
has logic zero values at both the first and second inputs 

30 and generates a logic zero value at output S. Likewise, 
adder 102 has logic zero values at both the first and 
second inputs and generates a logic zero value at output 
S. Adder 104 has a logic one value at the first input 
and a logic zero value at the second input and generates 

35 a logic one value at output S. Adder 10 6 has logic zero 
values at both the first and second inputs and generates 
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a logic zero value at output S. Thus, adders 106, 104, 
102, and 100 in column Xi generate a respective value of 
0100. 

The data generated by the adders in column X2 have 
5 values that also depend on the data at a particular data 
bit location. The particular data bit location in this 
instance corresponds with the output S of adder 104. It 
should be noted that an input of AND-gate 138 receives a 
logic one value from the signal at output S of adder 104. 

10 The logic one value at the output of AND-gate 138 causes 
latch 136 to set and the Q output of latch 136 to have a 
logic one value. AND-gates 109, 111, 113, and 115 are 
enabled by the logic one value generated by latch 13 6. 
Thus, the data at the outputs of adders 100, 102, 104, 

15 and 106 is transferred to the second inputs of adders 
110, 112, 114, and 116, respectively. Adder 110 has 
logic zero values at both the first and second inputs and 
generates a logic zero value at output S. Likewise, 
adder 112 has logic zero values at both the first and 

20 second inputs and generates a logic zero value at output 
S. Adder 114 has logic one values at both the first and 
second inputs and generates a logic zero value at output 
S and a logic one value for the carryout signal at output 
CO* The logic one value for the carryout signal is 

25 inhibited by AND-gate 11 4A from propagating to adder 116. 
Adder 116 has logic zero value at the first input, a 
logic one value at the second input, and a logic zero 
value for the carry input. A logic one value is 
generated at output S of adder 116. Thus, adders 116, 

30 114, 112, and 110 in column X2 generate a respective 
value of 1000. 

The data generated by adders 120, 122, 124, and 126 
in column X3 have values that also depend on the data at 
a particular data bit location. The particular data bit 

35 location in this instance corresponds with the output S 
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of adder 116. An input of AND-gate 142 receives a logic 
one value from the signal at output S of adder 116. And 
gate 142 having a logic one value from adder 116 and a 
logic one value for the signal T causes latch 140 to set. 
5 The Q output of latch 140 has a logic one value which 
enables AND-gates 119, 121, 123, and 125. The data at 
the outputs of adders 110, 112, 114, and 116 is 
transferred to the first inputs of adders 120, 122, 124, 
and 126, respectively. Adder 120 has logic zero values 

10 at both the first and second inputs and generates a logic 
zero value at output S. Likewise, adder 122 has logic 
zero values at both the first and second inputs and 
generates a logic zero value at output S. Adder 124 also 
has logic zero values at both the first and second inputs 

15 and generates a logic zero value at output S. Adder 126 
has logic one values at both the first and second inputs 
and generates a logic zero value at output S and a logic 
one value as the carryout signal at the carry output. 
AND-gate 12 6A inhibits the carryout signal from 

20 propagating to a latch 162. Thus, adders 126, 124, 122, 
and 120 in column X3 generate a respective value of 0000 
at output terminals 164-170. 

During the reduction process that occurs in the 
first multiplication cycle, the first N bits of the 

25 partial product of digits Aq and Bo are reduced to having 
values of zero. Latches 128, 132, 136, and 140 have been 
set and contain the value for ^1 of 1101 that is used in 
subsequent pipelined multiplications for determining the 
product of operands A and B. Following the reduction of 

30 the first partial product to zero, the signal T 

transitions from a logic one to a logic zero value and 
stores the value of ^ in latches 128, 132, 136, and 140. 
The stored value of il, a value for N<i+3), N ( i + 2), N<i+i), 

and N(i+o) of 0000, and a value for P<i+3)/ P(i+2), p {i+l)/ 
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and P(i+0) of 0000 are used by multiplier structure 171 to 
complete the polynomial reduction process. The signals 
at output terminals 170, 168, 166, and 164 have a 

2 

respective value of 0111, e.g. a value represented as (x 
5 + x + 1) in polynomial form, after the second 
multiplication cycle has completed. 

Briefly referring to FIG. 4, the modular polynomial 
multiplication of (A*R mod N) and (B*R mod N) produces 
the same binary product as found using the circuitry of 
10 FIG. 3. When calculating modular polynomial-basis 

multiplications, AND-gates 90A-96A, 100A-106A, 110A-116A, 
and 120A-126A are not enabled. Therefore, adders 90-96, 
adders 100-106, adders 110-116, and adders 120-126 do not 
propagate a carryin signal to adjacent adder cells. The 
15 disabled AND-gates cause a logic zero value to be 
supplied at each of the CI terminals. 

During the first multiplication cycle, the reduction 
process causes a value of 0000 to be generated as the 
first partial product of digits Ao and Bo at output 
20 terminals 170, 168, 166, and 164. In addition, latches 
224, 220, 216, and 212 are set during the generation of 
the first partial product and the latches retain the 
value for iJ of 1101 that is used in subsequent pipelined 
multiplications. During the second multiplication cycle, 
25 the signals generated at output terminals 170, 168, 166, 
and 164 have a respective binary value of 0111 or a value 
of (x + x + 1) in polynomial form. 

It should be noted that the architecture of 
multiplier structure 171 allows the value of id to be 
30 determined and stored in latches 212, 216, 220, and 224. 

In other words, the value of ^1 is not calculated prior to 
the multiplication of the operands A and B, but rather 
the value of id is determined and latched during the cycle 
that determines the multiplication of the digits Aq and 
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Bq. The latched value of ^1 is used during the 
multiplication of the other digits in the pipelined 
process that determine the full product of the operands A 
and B. 

5 FIG, 5 is a schematic diagram showing a portion of a 

multiplier 232 for computing modular polynomial-basis 
multiplications. Briefly referring to FIG. 4, AND-gates 
90A-96A, 100A-106A, 110A-116A, and 120A-126A are not 
enabled when multiplier structure 171 is used for 

10 computing modular polynomial-basis multiplications. 

Therefore, adder cells do not receive a carryin signal 
from the carryout (CO) terminal of an adjacent adder 
cell. Accordingly, the full adder cell of adders 90-96, 
100-106, 110-116, and 120-126 can be replaced by a half 

15 adder cell as illustrated in FIG. 5. The letter %X H" has 
been appended to the reference number of the exclusive-OR 
gates used as the half adder cells. 

6 4 

For the example where (A*R mod N) = (x + x ) mod N, 

(B*R mod N) » (x 6 ) mod N, A = (x 2 + 1 ) , B = (x 2 ) , R = 
4 3 

20 (x ) , and N = (x + x + 1), the polynomial multiplication 
of (A*R mod N) and (B*R mod N) produces a value of 0000 
at the respective output terminals 170, 168, 166, and 164 
during the first multiplication cycle. Thus, the first 
partial product is reduced to zero and the value of iJ is 

25 determined as having a value of 1101 and stored in 

respective latches 224, 220, 216, and 212. The stored 

value of ^1 is used during subsequent multiplication 

cycles that generate the full product of operands A and 

B. The signals at output terminals 170, 168, 166, and 

30 164 have a respective binary value of 0111, e.g., a value 
2 

of (x + x + 1) in polynomial form during the second 
multiplication cycle . 

FIG. 6 is a block diagram that illustrates a 1 x M 
multiplier 240 for computing either integer-modulo-N 
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multiplications or modular polynomial-basis 
multiplications, where M is the number of multiplier 
cells. Multiplier 240 has a B-register 242 for storing 
operand B, an A-register 244 for storing operand A, a C- 
5 register 246 for computing and storing a product value, 
and an N-register 248 for storing a value of N. Although 
a reset line is not shown, C-register 246 is initially 
cleared prior to the first multiplication cycle. It 
should be noted that N-register 248 stores a binary value 

10 having an odd integer value when multiplier 240 computes 
integer-modulo-N multiplications and a binary value for 
an irreducible polynomial when multiplier 240 computes 
modular polynomial-basis multiplications. Registers 242- 
248 are illustrated in FIG. 6 as M-bit wide registers. 

15 B-register 242, in the preferred embodiment, is a 

shift register that shifts the data stored in that 
register either to the left or to the right. By way of 
example, B-register 242 shifts data to the right when 
multiplier 240 computes integer-modulo-N multiplications, 

20 i.e., data-bits of B-register 242 are transferred to mux 
250 starting with the least-significant data-bits of B- 
register 242. On the other hand, B-register 242 shifts 
data to the left when multiplier 240 computes modular 
polynomial-basis multiplications, i.e., data-bits of B- 

25 register 242 are transferred to mux 250 starting with the 
most-significant data-bits of B-register 242, The clock 
signals used to latch values in B-register 242, A- 
register 244, C-register 246, and N-register 248 are not 
shown in FIG. 6. Also, the bus lines connected to inputs 

30 and outputs of each register that allow data to be 

transferred to and retrieved from the registers are not 
shown. 

Multiplier 240 computes either integer-modulo-N 
multiplications or modular polynomial-basis 
35 multiplications based on the logic state of the signal at 
the INT/POLY input. The INT/POLY input is connected to 
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the select input of a multiplexer (mux) 250, to B- 
register 242, and to an input of the adder cells of C- 
register 246 (see input INT/POLY in FIGs. 7 and 8). 
Thus, when the signal at the INT/POLY input causes 
multiplier 240 to compute modular polynomial-basis 
multiplications, B-register 242 operates to shift data to 
the left, presenting the data from the most significant 
data-bit position of B-register 242 through mux 250 to 
inputs of C-register 24 6. When multiplier 24 0 computes 
integer-modulo-N multiplications, B-register 242 operates 
to shift data to the right, presenting the data from the 
least significant data-bit position of B-register 242 
through mux 250 to inputs of C-register 246. 

FIG. 7 is a schematic diagram of a cell 270 that is 
used in C-register 246 of multiplier 240 (FIG. 6) for 
single-cycle multiplication operations. Although 
multiplier 240 is illustrated as a ripple-carry 
multiplier, it should be understood that multiplier 24 0 
could be implemented as a carry-save multiplier. Thus, 

cells C( n -i), C( n -2)/ / and Co of C-register 246 

incorporate cell 270 in computing modular polynomial- 
basis and integer-modulo-N multiplications. A logic zero 
at the input INT/POLY of cell 270 causes cell 270 to 
compute the modular polynomial-basis. Latch 262 in cell 
270 latches the "ith" bit, storing the value <Aj.*Bi © 

Ni*C H iGH ® CARRY IN (i-i) ) , where Aj., Bi, and Ni are values 
stored at a particular bit location (designated as bit 
location i) of A-register 244, B-register 242, and N- 
register 24 8, respectively. Chigh is the value of the 
most significant data bit that is stored in C-register 
24 6, C(i_u is the previous partial product value that is 
stored in the register cell that is adjacent to the *ith" 
bit in C-register 246. 

On the other hand, when multiplier 240 is selected 
for computing integer-modulo-N multiplications, latch 262 
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of cell 270 latches the value of <Ai*Bi 0 CARRY I N ( i - 1 ) © 

CARRYIN (i-2) © c (i-l) © Ni*C L ow) , where Aj., Bj., and Ni are 
values stored at the xv ith" bit location of A-register 
244, B-register 242, and N-register 248, respectively. 

5 Clow is the value of the least significant data bit that 
is stored in C-register 24 6. CARRYIN (i-i) is the carry 
signal that propagates from the adder cell that is 
adjacent to the "ith" bit in C-register 24 6. CARRYIN (i-2) 
is the carry signal propagated from an adder cell that is 

10 two cells removed from the "ith" bit in C-register 246. 

C<i-i) is a previous partial product value that is stored 
in a latch that is adjacent to the xx ith" bit in C-- 
register 246. 

In operation, the multiplication of operand A by 

15 operand B in integer form for integer-modulo-N 
multiplications is accomplished in multiple 
multiplication cycles. Data is shifted from B-register 
242, one data bit each multiplication cycle, to C- 
register 246. Thus, C-register 246 performs the 

20 multiplication of operands A and B and reduces that 

product by multiples of N to generate the value (A*B*R 1 
mod N) . Thus, in the first multiplication cycle, the 
least significant data bit of operand B is shifted 
through mux 250 to C-register 24 6. In the next 

25 multiplication cycle, the shift right operation of B- 

register 242 causes the next least significant data bit 
to be transferred through mux 250 to C-register 24 6. The 
multiplication process continues until B-register 242 has 
shifted the stored value of operand B though mux 250, one 

30 data bit per multiplication cycle, to C-register 246 and 

C-register 24 6 generates the product (A*B*R~ 1 mod N) . 

It should be noted that the multiplication of 
operand A, having the form (A*R mod N) , with operand B, 
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also having the form (B*R mod N) , generates the product 
(A*B*R mod N) in reduced form. In other words, the 
product is reduced by R. By way of example, the (A*R mod 
N) term having a value of 10110 is stored in A-register 
5 244, the (B*R mod N) term having a value of 10101 is 

stored in B-register 242, and the N term having a value 
of 11101 is stored in N-register 248. Initially, C- 
register 246 is cleared, causing the previous partial 

product C(i-i) to have a value of zero. In this example, 
10 multiplier 240 generates the product (A*B*R mod N) having 

the value (1001) . 

Specifically, the first partial product is generated 

by multiplying the value stored in A-register 244 by the 

least significant data bit from B-register 242. Thus, A- 
15 register 244 has a value (10110) that is multiplied by 

B(0), i.e., the least significant bit of B and a logic 

one value (10101.). 

(1) 10110 <== value stored in A-register 244 
20 (2) x 10101 <== B(0), least significant bit of B 

(3) 10110 <== first bit multiply 



Using the Foster-Montgomery Reduction Algorithm, the 
logic value of the data in a particular bit location of 

25 the partial product determines whether the value of N 
should be aligned and added to the partial product to 
reduce the value of the partial product for mod N. When 
the particular bit location has a logic zero value, then 
the value of N is not added to the partial product. On 

30 the other hand, the value of N is added to the partial 

product when the particular bit location has a logic one 
value. In this example, the particular bit location is 
the least significant bit of the first bit multiply 
(10110) . A logic zero value is in this location and 

35 accordingly, the value of N is not added to the first bit 
multiply (3) . 
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The second bit multiply involves the multiplication 
of the value stored in A-register 244 by the next least 
significant bit from B-register 242. Thus, the value in 
A-register 244 (10110) is multiplied by B(l), i.e., the 
5 next least significant data bit of B and a logic zero 
value (10101) . 



(1) 10110 

244 

10 (4) x 10101 

significant bit 

(5) 00000 



15 



20 



25 



30' 



(6) 10110 

(7) + 11101 <== 
35 (8) 1010000 <== 



<== value stored in A-register 

<== B(l), next least 

<== second bit multiply result 



<== second partial product 

aligned value of N 

reduced second partial product 



The product of the second bit multiply (5) is summed 
with the stored previous result (3) to generate the 
second partial product (6) . 

(5) 00000 <== second bit multiply 
(3) + 10110 <== first partial product 

(6) 10110 <== second partial product 

In the Foster-Montgomery Reduction Algorithm, the 
logic value of the particular bit location of the second 
partial product determines whether the second partial 
product should be reduced. In this case, the particular 
bit location is the location just to the left of the 
least significant data bit (10110) . The second data bit 
has a logic one value and accordingly, the value of N is 
aligned and added to the second partial product. In 
other words, the second partial product is reduced by the 
addition of N aligned at the particular bit location. 



WO 00/38047 



PCT/US99/02451 



32 

The third bit multiply involves the multiplication 
of the value stored in A-register 244 by the logic value 
of B(2), i.e., the value of the data bit located in the 
third bit location (101.01) from the right in B-register 
5 242. 

(I) 10110 <== value stored in A-register 

244 

(9) x 10101 <== B(2), next least significant 

10 bit 

(10) 10110 <== third bit multiply result 

Following the third bit multiply, the product of the 
third bit multiply (10) is added to the previous result 
15 (8) to provide the third partial product (11) . 

(8) 1010000 <==" previous result 

(10) + 10110 <== third bit multiply 

(II) 10101000 <== third partial product 

The logic value of the particular bit location of 
the third partial product determines whether the third 
partial product should be reduced. In this example, the 
particular bit location is the third bit location from 
the right (10101000) . The third data bit has a logic 
zero value and accordingly, the value of N is not aligned 
and added to the third partial product. 

The fourth bit multiply involves the multiplication 
of the value stored in A-register 244 by the logic value 
of B(3), i.e., the value of the data bit located in the 
fourth bit location (1£101) from the right in B-register 
242. 

(1) 10110 <== value stored in A-register 

35 244 



20 



25 



30 
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(12) x 10101 <== B(3), next least significant 

bit 

(13) 00000 <== fourth bit multiply result 

5 Following the fourth bit multiply, the fourth bit 

multiply result is added to the third partial product 
(11) to provide the fourth partial product (14) . 

(11) 10101000 <== third partial product 
10 (13) + 00000 <== fourth bit multiply result 

(14) 10101000 <== fourth partial product 



The logic value of the particular bit location of 
the fourth partial product determines whether the fourth 
15 partial product should be reduced. In this example, the 
particular bit location is the fourth bit location from 
the right (1010.1000) . The fourth data bit has a logic 
one value and accordingly, the value of N is aligned and 
added to the fourth partial product. 

20 

(14) 
(15) 
(16) 
product 

25 

The fifth bit multiply involves the multiplication 
of the value stored in A-register 244 by the logic value 
of B(4), i.e., the value of the data bit located in the 
fifth bit location (3.0101) from the right in B-register 
30 242. 

(1) 10110 <== value stored in A-register 

244 

(17) x 1.0101 <== B(4), next least significant 

35 bit 

(18) 10110 <== fifth bit multiply result 



10101000 <== fourth partial product 

+ 11101 <== aligned value of N 

110010000 <== reduced fourth partial 



WO 00/38047 



PCT/US99/02451 



Following the fifth bit multiply, the fifth bit 
multiply result is added to the reduced fourth partial 
product (16) to provide the fifth partial product (19) . 

5 

(16) 110010000 <== reduced fourth partial 
product 

(18) + 10110 <== fifth bit multiply result 

(19) 1011110000 <== fifth partial product 

10 

Again, the logic value of the particular bit 
location of the fifth partial product determines whether 
the fifth partial product should be reduced. In this 
example/ the particular bit location is the fifth bit 
15 location from the right (1011110000) . The fifth data bit 
has a logic one value and accordingly, the value of N is 
aligned and added to "the fifth partial product. 



(19) 1011110000 <== 

20 (20) -f 11101 <== 
aligned 

(21) 10011000000 <== 



fifth partial product 
the value of N properly 

reduced fifth partial product 



The product of (A*R mod N) and (B*R mod N) , i.e., 
25 (10110) and (10101), has a value that is greater than the 
value of N. When the reduced final partial product has a 
value that is greater than N, then the value of N is 
subtracted from that final partial product. In other 
words, the value of N (11101) is aligned and subtracted 
30 from the reduced partial product (10011000000) . It 

should be noted that the 1 x N multiplier 24 0 has been 
used in computing the final product (A*B*R mod N) having 
a value of 1001. 

The value of in the Foster-Montgomery Reduction 
35 Algorithm is not computed prior to the multiplication of 
the operands A and B but, as noted in the previous 
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example, the value of ^1 is determined while the product 
of the digits Ao and Bo is being reduced. It should be 
noted that the value for N is odd, i.e., the value of N 
has a logic one value in the position for the least 
5 significant bit. Thus, by adding N to the summed value 
when the logic value of the particular bit location has a 
logic one value, the value (A*B*R mod N) is generated 
having a number of zeros in the lower bit locations. Put 
another way, the Foster-Montgomery Reduction Algorithm 
10 causes the least significant bit locations to have logic 
zero values in generating a product that is reduced by 
the value R. 

Referring to FIGs. 6 and 7, the product (A*B) mod N 

M 

can be generated to support ECC (F2 in the polynomial- 

15 basis) , where A and B are finite field elements 

representing the coordinates of the elliptic curve and N 
is the irreducible or basis polynomial. The number of 
multiplication cycles required to generate the product 
depends, in part, on the number of bits stored in B- 

20 register 242. Data is shifted from B-register 242, one 
data bit at a time, to C-register 246. Thus, C-register 
24 6 performs the multiplication of operands A and B and 
reduces that product by multiples of N in generating the 
value A*B mod N. Since a carry signal is not propagated 

25 between adder cells when multiplier 24 0 is computing 

modular polynomial-basis multiplications, the calculation 
of modular polynomial-basis multiplications can begin by 
multiplying the most significant data bit from A-register 
244 with the most significant data bit from B-register 

30 242. This eliminates the necessity of putting the 

operands into the Montgomery format, i.e., A -* AR mod N. 
B-register 242 shifts data bits, starting with the most 
significant data bits, through mux 250 to C-register 24 6. 
The multiplication of the value stored in A-register 

35 244 by the most significant data bit stored in B-register 
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242, i.e., the value B(4), generates the first partial 

product. Thus, by way of example, A-register 244 has a 

4 2 

binary value 10110 (x + x + x, in polynomial form) that 

is multiplied by 3(4), i.e., a binary one value _11101 
4 

(x , in polynomial form) . The irreducible polynomial N 

5 2 

has a value of 100101 (x + x + 1, in polynomial form) . 



(1) 10110 <== value stored in A-register 244 

(2) x JL1101 <== B<4), most significant bit 
10 (3) 10110 <== first partial product result 

The first partial product is added to a previous 

partial product, initially having a value of zero based 

on a reset of C-register 24 6, providing a summed value of 

15 10110. In the next multiplication cycle, the data in B- 

register 24 2 is shifted to the left and the next most 

significant data bit of B-register 242 is transferred 

through mux 250 to C-register 246. C-register 246 

multiplies the value stored in A-register 244 by the next 

20 most significant data bit. Thus, the binary value 10110 
4 2 

(x + x + x, in polynomial form) is multiplied by B(3), 

3 

i.e., a binary one value 1^101 (x , in polynomial form). 

(1) 10110 <== value stored in A-register 244 

25 (4) x 11101 <== B(3), next least significant bit 

(5) 10110 <== second bit multiply result 

The second bit multiply result (5) is summed with 
the stored previous result to generate the second partial 
30 product (6) . 

(3) 10110 <== first partial product 

(5) + 10110 <== second bit multiply result 

(6) 111010 <== second partial product 

35 
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The logic value of a particular bit location is 
tested to determine whether the partial product should be 
reduced. When the value of the data bit at the 
particular bit location has a logic one value, the value 
5 of N is aligned to that particular bit location and added 
to the partial product. In this case, the particular bit 
location is the most significant data bit location of the 
generated second partial product. The value of the data 
bit at the particular bit location has a logic one value 

10 (JL11010) . Therefore, the value of N is aligned (x 3 *N) 

and subtracted from the most significant data bit 
location . 

It should be noted that when computing modular 
polynomial-basis multiplications, multiplier 240 does not 
15 propagate a. carry signal and, therefore, the operation of 
"adding" or "subtracting" is an exclusive-OR of the two 
values. It. should be further noted that the most 
significant data location of the second partial product 
is reduced to a zero value by the addition of N. 



20 



25 



(6) 111010 <== second partial product 

(7) - 100101 <== aligned value of N (x 8 + x 5 + x 3 ) 

(8) 011111 <== reduced second partial product 

(x 7 + x 6 + x 5 + x 4 + x 3 > 



The third bit multiply involves the multiplication 
of the value stored in A-register 244 by the logic value 
of B(2), i.e., the value of the data bit located in the 
third bit location (11101) from the left in B-register 
30 242. 



(1) 10110 <== value stored in A-register 244 

(9) x 11JL01 <== B(2), next most significant bit 

(10) 10110 <== third bit multiply result 



35 
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Following the third bit multiply, the product of the 
third bit multiply (10) is added to the previous result, 
i.e., the reduced second partial product (8), to provide 
the third partial product (11) - 

(8) 011111 <== reduced second partial product 

(10) + 10110 <== third bit multiply (x 6 + 
x + x ) 

(11) 0101000 <== third partial product (x 7 



10 + x 5 ) 



The logic value of the particular bit location of 
the third partial product determines whether the third 
partial product should be reduced- In this example, the 
15 particular bit location is the second bit location from 
the left (03.01000) . The second data bit has a logic one 

value and accordingly, the value of N is aligned (x *N) 

and subtracted from the third partial product, 

20 (11) 0101000 <== third partial product (x 7 + x 5 ) 

(12) - 100101 <== aligned value of N (x 7 + x 4 + x 2 ) 

(13) 0001101 <== reduced third partial product 

(x 5 + x 4 + x 2 ) 



25 The fourth bit multiply involves the multiplication 

of the value stored in A-register 244 by the logic value 
of B(l), i.e., the value of the data bit located in the 
fourth bit location (11101) from the left in B-register 
242. 

30 

(1) 10110 <== value stored in A-register 244 

(14) x 11101 <== B(l), next most significant bit 

(15) 00000 <== fourth bit multiply result 
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Following the fourth bit multiply, the fourth bit 
multiply result (15) is added to the reduced third 
partial product (13) to provide the fourth partial 
5 product (16) . 

(13) 0001101 <== reduced third partial product 

(15) + 00000 <== fourth bit multiply result 

(16) 00011010 <== fourth partial product 

5 4 2 

10 (x + x + x ) 



The logic value of the particular bit location of 
the fourth partial product determines whether the fourth 
partial product should be reduced. In this example, the 

15 particular bit location is the third bit location from 

the left (00011010) . The third data bit has a logic zero 
value and accordingly, the value of N is not added to the 
fourth partial product. 

The fifth bit multiply involves the multiplication 

20 of the value stored in A-register 24 4 by the logic value 
of B(0), i.e., the value of the data bit located in the 
fifth bit location (11101) from the left in B-register 
242. 



25 (1) 10110 <== value stored in A-register 244 

(17) x 11101 <== B(0), next most significant bit 

(18) 10110 <== fifth bit multiply result 

Following the fifth bit multiply, the fifth bit 
30 multiply result (18) is added to the reduced fourth 

partial product (16) to provide the fifth partial product 
(19) . 



35 



(16) 00011010 <== 

(18) + 10110 <== 

(19) 000100010 <== 



reduced fourth partial product 
fifth bit multiply result 
fifth partial product 
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(x 5 + x) 

The logic value of the particular bit location of 
the fifth partial product determines whether the fifth 
5 partial product should be reduced. In this example, the 
particular bit location is the fourth bit location from 
the left (000100010) . The fourth data bit has a logic 
one value and accordingly, the value of N is aligned and 
subtracted from the fifth partial product. 

10 

(19) 000100010 <== fifth partial product 

(20) - 100101 <== the value of N properly aligned 

(21) 000000111 <== reduced fifth partial product 

(x 2 + x + 1) 

15 

The multiplication process continues until B- 
register 242 has shifted the stored value of operand B 
though mux 250, one data bit per multiplication cycle, to 
C-register 24 6 and C-register 24 6 has generated the 
2 0 product (A*B mod N) . The (A mod N) term, having a binary 
value of 10110 (x 4 + x 2 + x 1 , in polynomial form), is 

multiplied with the (B mod N) term, having a binary value 
4 3 2 

of 11101 (x + x + x + 1, in polynomial form) to 

2 

generate the binary value of 000000111 (x + x + 1, xn 

25 polynomial form) . 

FIG. 8 is a schematic diagram of another cell that 
can be used in all bit locations of C-register 246 of 
multiplier 240 (FIG. 6) for two-cycle multiplication 
operations. Referring to FIG. 6, cell 280 (FIG. 8) 

30 describes the logic for cells C< n -l)/ C( n -2)#---# and c 0 of 
C-register 24 6. A logic zero at input INT/POLY of 
multiplier 240 selects the multiplier for computing 
modular polynomial-basis multiplications. Referring to 
FIG. 8, a latch in cell 280 latches the value (Ai*Bi e 
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Ni*C H iGH © C(i-i)), where Ai, Bi, and Ni are values stored 
at a particular bit location (designated as bit location 
i) of A-register 244, B-register 242, and N-register 248, 
respectively- Chigh is the value of the most significant 
5 data bit that is stored in C-register 246. C(i-i) is the 
previous partial product from an adder cell that is 
located adjacent to the "ith" cell in C-register 246. 

On the other hand, when multiplier 240 (FIG. 6) is 
selected for computing integer-modulo-N multiplications, 
10 cell 280 latches the value (A ± *Bi © CARRYIN0 (i-i) © C<i- 

!)), where Ai and Bi are values stored at a particular bit 
location (designated as bit location i) of A-register 244 
and B-register 242, respectively. CARRY IN 0 (i-i) is the 
carry signal that propagates from the adder cell that is 
15 located adjacent to the "ith" cell in C-register 246. 

C(i-i) is a previous partial product value that is stored 
in the adder cell that is located adjacent to the x> ith" 
cell in C-register 246. 

If the least significant data bit (LSB) that is 
20 latched in C-register 24 6 (FIG. 6) has a logic one value, 
then a second multiplication cycle is used to determine 
Ci © Ni © CARRYINo (i-i) and cause a reduction of the 
generated partial product. This is indicated by the 
REDUCED input signal having a logic one value. Ni is a 
25 value stored at a particular bit location (designated as 
bit location i) of N-register 248. Thus, the first 
multiplication cycle computes the partial product of 
Ai*Bi, and depending on the calculated partial product, 
the second multiplication cycle reduces the partial 
30 product. A feedback path provides the value of Ci to mux 

282 and a conduction path provides the value of Ni 
through mux 284 to inputs of full adder 286 during the 
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second multiplication cycle. On average, about 50 
percent of the time the second multiplication cycle is 

needed in generating the reduced product (A*B*R 1 mod N) . 

By now it should be appreciated that the present 
5 invention provides a cryptographic multiplication system 
that achieves high performance, low cost, and low-power 
for implementation in an integrated circuit. The 
hardware multiplier achieves high performance by 
computing a product of two operands to support the RSA 
10 and ECC algorithm. The multiplication system is 

adaptable to large operands and performs calculations in 
fewer clock cycles than in prior art systems. 
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CLAIMS 

1. An integrated circuit (24) for processing 
cryptographic functions, comprising an arithmetic 

5 processor (22) having a data input coupled for receiving 
data and a select input coupled for receiving a select 
signal (INT/POLY) , wherein the arithmetic processor 
processes the data according to a Rivest-Shamir-Adleman 
(RSA) algorithm when the select signal has a first value 
10 and the arithmetic processor processes the data according 
to an Elliptic Curve Cryptography (ECC) algorithm when 
the select signal has a second value. 

2. The integrated circuit of claim 1, further 
15 including a modulo reducer (60), the modulo reducer 

comprising: 

a first adder cell (90) having a first and second 
input terminals coupled for receiving respective first 
and second data signals; 
20 a logic gate (90A) having a first input coupled for 

receiving an enable signal (INT/POLY) and a second input 
coupled to a carryout terminal (CO) of the first adder 
cell; and 

a second adder cell (92) having first and second 
25 input terminals coupled for receiving respective third 
and fourth data signals, and a carryin (CI) terminal 
coupled to an output of the logic gate. 

3. The integrated circuit of claim 2, wherein the 
30 enable signal (INT/POLY) having a first value blocks a 

carryout signal at the carryout terminal (CO) of the 
first adder cell (90) from the carryin (CI) terminal of 
the second adder cell (92) . 
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4. An array of cells for calculating an Elliptic 
Curve Cryptography (ECC) algorithm, wherein a first cell 
comprises : 

a first multiplexer (172) having first, second, 
5 third, and fourth inputs coupled for receiving respective 
first, second, third and fourth data signals, and first 
and second selector inputs coupled for receiving 
respective first and second selector signals that select 
one of the data signals for transfer to an output of the 
10 first multiplexer; and 

a first exclusive-OR gate (90H) having a first input 
coupled to the output of the first multiplexer, a second 
input coupled for receiving a first data bit P(0), and an 
output for providing an output signal of the first cell. 

15 

5. The array of cells of claim 4, wherein the 
fourth data signal is a summed value of the second and 
third data signals. 

20 6. The array of cells of claim 4, wherein a second 

cell in a same row of the array and adjacent to the first 
cell comprises: 

a second multiplexer (182) having first, second, 
third, and fourth inputs coupled for receiving respective 

25 fifth, sixth, seventh, and eighth data signals, and first 
and second selector inputs coupled for receiving 
respective third and fourth selector signals that select 
one of the data signals for transfer to an output of the 
second multiplexer; and 

30 a second exclusive-OR gate (100H) having a first 

input coupled to the output of the second multiplexer, a 
second input coupled to the output of the first 
exclusive-OR gate (90H) , and an output for providing an 
output signal of the second cell. 



35 
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7. A method of using a common arithmetic processor 
(240) for computing integer-modulo-N multiplications and 
the Elliptic Curve Cryptography (ECC) algorithm in an 
integrated cryptographic circuit, comprising the steps 

5 of: 

providing values A, B, and N; 

generating a value as A*B*R~ 1 mod N as an operation 
of the arithmetic processor (240) , where A and B are 
integers, R is a power of two having a value greater than 
0 N, and N is an odd modulus; and 

generating a value A*B*R~ 1 mod N as an operation of 
the arithmetic processor (240), where A and B are 
polynomials, R is a power of two having a value greater 
than N, and N is an irreducible polynomial. 

.5 

8. A multiplier (171). having a plurality of 
interconnected multiplier cells, wherein a first one of 
the multiplier cells comprises: 

a first adder (92) having a data input coupled 
20 for receiving a first data signal, a second input coupled 
for receiving a carryin signal, and an output that 
supplies a data output signal; and 

a blocking circuit (90A) having an input 
coupled for receiving the carryin signal, an output 
25 coupled to the second input of the first adder, and a 
control input coupled for receiving a select signal 
(INT/POLY) . 

9. The multiplier of claim 8, wherein a first 

30 value of the select signal passes the carryin signal to 
the second input of the first adder (92) and a second 
value of the select signal blocks the carryin signal from 
the second input of the first adder. 

35 10. The multiplier of claim 8, wherein the blocking 

circuit includes a first logic gate having a first input 
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coupled for receiving the carryin signal, a second input 
coupled for receiving the select signal, and an output 
coupled to the second input of the first adder. 



5 
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